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A stochastic algorithm for the recursive approximation of the 
location 6 of a maximum of a regression function was introduced 
by Kiefer and Wolfowitz [Ann. Math. Statist. 23 (1952) 462-466] 
in the univariate framework, and by Blum [Ann. Math. Statist. 25 
(1954) 737-744] in the multivariate case. The aim of this paper is 
to provide a companion algorithm to the Kiefer-Wolfowitz-Blum al- 
gorithm, which allows one to simultaneously recursively approximate 
the size fi of the maximum of the regression function. A precise study 
of the joint weak convergence rate of both algorithms is given; it turns 
out that, unlike the location of the maximum, the size of the maxi- 
mum can be approximated by an algorithm which converges at the 
parametric rate. Moreover, averaging leads to an asymptotically effi- 
cient algorithm for the approximation of the couple (9,y?j. 

1. Introduction. Consider two random variables X and Z with values 
in W 1 and M, respectively, that have unknown common distribution Px,z- 
Assume that the regression function /(•) = E(Z|X = •) — > M exists, is 
sufficiently smooth and has a unique maximizer 9 G W d , 

9 = argmaxE(Z|X = x), 



and assume that observations Z(x) of f(x) are available at any level x [Z(x) 
has conditional distribution C(Z\X = x)\. Kiefer and Wolfowitz [15] (in the 
case d= 1) and Blum [1] (in the case d> 1) have introduced an algorithm, 
which allows one to recursively approximate 9. Their procedure consists in 
running the recursion 

(1) &n+l = + a n Y n , 
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where (a n ) is a positive nonrandom sequence that goes to zero as n goes to 
infinity, and Y n is a (random) approximation of V/(# n ), the gradient of / 
at the point 9 n . More precisely, let (c n ) be a positive nonrandom sequence 
that goes to zero, and let (e\, . . . , e^) denote the canonical basis of M. d ; the 
approximation Y n introduced by Kiefer and Wolfowitz [15] and Blum [1] is 
the d-dimensional vector 

Yn = ~ {Z(9 n + C n ei) - Z(9 n - C n ei)} ie {i^,^ d } . 

Kiefer and Wolfowitz [15] proved the convergence in probability of n to 6 
and Blum [1] established its almost sure convergence. Their algorithm (1) 
has since been widely studied and their pioneering work extended in many 
directions. Among many, let us cite Fabian [11], Kushner and Clark [17], Hall 
and Heyde [13], Ruppert [31], Chen [3], Spall [33, 34], Polyak and Tsybakov 
[30], Dippon and Renz [8], Pelletier [26], Chen, Duncan and Pasik-Duncan 
[4] and Dippon [6]. 

As noted by Kiefer and Wolfowitz [15], the statistical importance of ap- 
proximating the maximizer 9 of the regression function / is obvious and need 
not be discussed. Although the approximation of the size of the maximum, 
that is of the parameter /x = f(9), seems important as well, this problem has, 
as far as we know, never been considered. The aim of this paper is to pro- 
pose an algorithm, which by using the approximation 9 n of 9 defined by (1), 
allows one to simultaneously recursively approximate /j, by a sequence fi n 
that converges almost surely to fi, and to study the joint weak convergence 
rate of 9 n and /i n . 

The algorithm we present to approximate /i is defined by 

(2) n n +i = (1 - a n )n n + a n Y n , 

where (a n ) is a positive nonrandom sequence that goes to zero as n goes to 
infinity, and Y n is an approximation of f{9 n ). This approximation method 
has certain similarities to the sequential procedure for estimating disconti- 
nuities of a regression function or surface proposed by Hall and Molchanov 
[14]. A first way to approximate f(9 n ) is to take the average of the obser- 
vations of f(9 n + c n ei) and f(9 n — c n ei) used for the computation of Y n ; 
all these observations or only a symmetric part of them may be used. More 
precisely, let S denote a (nonempty) subset of {1,2, ... ,d}, and define the 
real- valued random sequence (Y n ) by 

Yn = \ ^Li Z ( e n + c n ei) + Z(9 n - c n ei)}, 
ies 

where 5 is twice the number of elements in S. Note that in the case the step 
size in (2) is chosen such that (a n ) = (n _1 ) and if S = {1, 2, . . . , d}, then /x n+ i 
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is simply the average of all the observations made for the approximation 9 n 
of 9, that is, 

1 n ~ 1 

fi n+ i = - Y k = TTT Y {Z(9 k + c k ei) + Z(6 k -c k ei)}. 

n k=l MU ie{l,2,...,d},k£{l,2,...,n} 

We prove that, under suitable assumptions, fi n converges almost surely to fi. 
Moreover, we study the weak convergence rate of the couple (9 n — 9,fi n — [i). 
As was already well known, the optimal convergence rate of 9 n (which is 
ra 1 / 3 ) is obtained by choosing in (1) (a n ) = (aort -1 ) with adequate conditions 
on ao, and (c n ) = (cora^ 1 / 6 ), cq > 0; setting (d n ) = (don -1 ), do > 1/2, in (2) 
then makes [i n converge with the rate n 1 / 3 also. Now, other choices of (c n ) 
in (1) and (2) allow one to obtain a convergence rate for « n close to (but less 
than) the parametric rate \/n; however, in this case, the convergence rate of 
9 n becomes close to rc 1 / 4 . This constatation makes clear the drawback of the 
double algorithm (1) and (2): when choosing the sequence (c n ) [or, in other 
words, the points where the observations Z(9 k ± c k ei) of f(9 k ± c k ei) are 
taken], a compromise must be made since both sequences 9 n and [i n cannot 
simultaneously converge at the optimal rate. 

The way to address this drawback is of course not to use the same sequence 
(c n ) (i.e., to use different observations) for the approximation of V/(^ n ) in 
(1) on the one hand, and for the approximation of f(9 n ) in (2) on the 
other hand. More precisely, let 5 > 1, Zi(9 n ), 1 < i < 6, be 5 independent 
observations of f(9 n ), y n be the approximation of f(9 n ) defined by 

1 5 

(3) ^n = ^£^(#n), 

i=l 

and let the approximation algorithm for [i be defined as 

(4) Hn+i = (1 - a n )fi n + a n y n . 

We prove that the sequence /i n defined in this way still converges almost 
surely to /x. Moreover, we study the joint weak convergence rate of 9 n and 
fj, n defined by (1) and (4), respectively. We prove in particular that if the 
stepsizes in (1) and (4) are chosen such that (a n ) = (ao^ -1 ), with adequate 
conditions on ao, (c n ) = (con -1 / 6 ), Co > 0, and (a n ) = (do^ -1 ), do > 1/2, then 
(9 n ) converges with its optimal rate n 1//3 , and (/j, n ) with the parametric rate 
y/n. Moreover, choosing do = 1 leads to the minimum asymptotic variance of 
(fin)'- when (d n ) = (ra" 1 ), the algorithm (4) is asymptotically efficient. Note 
that this case corresponds to the case 

1 n ~ 1 

k =l un ie{l,2,...,<5},fce{l,2,...,n} 
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The striking aspect of our result on (4) is that, whereas approximation of 
the size of the maximum of a regression function is typically a nonparamet- 
ric problem, and although the stochastic approximation algorithm (4) uses 
approximation of the location of the maximum of the regression function 
9 n (which itself does not converge with the parametric rate), the conver- 
gence rate we obtain for the sequence \i n is the parametric rate y/n. This 
is explained by the fact that although u n depends (through y n ) on 9 n , 
the quantity which actually is involved in the convergence rate of (/x n ) is 
\\9 n — 9\\ 2 , and, for suitable choices of (a n ) and (c n ), this quantity goes to 
zero faster than y/n. [Of course, this is still true in the framework of the 
double algorithm (1) and (2), but in this case the convergence rate of (/i n .) 
depends on (c n ) and is less than y/n.] 

Now, as is well known, the choice of the step size (a n ) = (a^n" 1 ) in (1) 
is the one which leads to the optimal convergence rate of 9 n , but it induces 
conditions on ao which are difficult to handle because of dependence on an 
unknown parameter [see (9) in the sequel]. The well known approach used 
to obtain optimal convergence rates for stochastic approximation algorithms 
without a tedious condition on the step size is to use the averaging principle 
independently introduced by Ruppert [32] and Polyak [28]. Their averaging 
procedure, which has been widely discussed and extended (see, among many 
others, Yin [35], Delyon and Juditsky [5], Polyak and Juditsky [29], Kush- 
ner and Yang [18], Le Breton [19], Le Breton and Novikov [20], Dippon and 
Renz [7, 8] and Pelletier [27]), allows one to obtain asymptotically efficient 
algorithms, that is, algorithms which not only converge at the optimal rate, 
but also have an optimal asymptotic covariance matrix. This procedure con- 
sists in (i) running the approximation algorithm by using slower step sizes 
and (ii) computing a suitable average of the approximations obtained in (i) . 

Let us now give our scheme to efficiently approximate 9 and u simulta- 
neously. First, we apply the averaging principle to the approximating algo- 
rithm (1) of 9 by proceeding as follows. Let the step size (a n ) in (1) satisfy 
lim^-Kx, na n = oo, let the sequence (6k) be defined by the algorithm (1) and 



It is well known that the sequence (9 n ) is asymptotically efficient (see, e.g., 
[8]). Then, to approximate [i efficiently, we can just set (a n ) = (n" 1 ) in 
(4) since this algorithm is asymptotically efficient (see the comments below 
Theorem 2). However, when adding observations of /, it seems more natural 
to take the observations at the point 8 n (rather than at 9 n ) since 9 n converges 
to 9 faster than 9 n does. That is the reason why we let 5 > 1, Zi(9 n ), 1 < 



set 



(5) 
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i < 5, be 5 independent observations of f(9 n ), y n be the approximation of 
f(9 n ) denned by 

1 S 

(6) y n =g y £ l z i (0 n ), 

i=l 

and let the approximation algorithm for fi be defined as 

(7) fl n +l = ( 1 )fJ>n + -yn- 

V n J n 

The consistency of fi n defined by (7) is obvious; we study the joint weak 
asymptotic behavior of 6 n and \i n defined by (5) and (7). We prove in par- 
ticular that by setting (c n ) = (con -1 / 6 ) in (1), we obtain simultaneously the 
asymptotic efficiency of both sequences (9 n ) and (fi n ). 

Let us finally mention that, in the case where no additional observations 
are taken to approximate /U, we can of course also average the algorithm 
(1). However, we shall point out that when the only parameter of interest 
in the double algorithm (1) and (2) is fi, it is preferable not to do so. As a 
matter of fact, we show there are possible choices of (a n ) for which there is 
no tedious condition on ao, and which lead to better convergence rates for 
(fin) than those which can be reached by averaging 9 n . 

2. Assumptions and main results. Let us first define the class of positive 
sequences that will be used in the statement of our assumptions. 

Definition 1. Let a£R and (v n ) be a nonrandom positive sequence. 
We say that (v n ) £ QS(a) if 

(8) lim n 

n— >oo 

Condition (8) was introduced by Galambos and Seneta [12] to define reg- 
ularly varying sequences (see also [2]). Typical sequences in QS(a) are, for 
o£t, n a (logn) a , n a (loglogn) a , and so on. 

Set 

W+t = Z(9 n + C n ei) - f(e n + C n ei), 

W~t = Z(9 n - c n ei) - f(6 n - c n ei), 

w n , i = z i (e n )-f(e n ), 
W n ,i = Zi(5 n )-f(S n ). 

(The notation W n< i (resp. W n ,i) is useful only in the case (fi n ) is defined 
by (4) [resp. by (7)].) In order to state our assumptions in a compact way, 



1 - 



V n -l 



a. 



G 
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we introduce the sequence (b n ) denned as 

/, x J (cn), in the case (fi n ) is defined by (2), 

\ 0, in the case (/i n ) is defined by (4) or by (7), 

and set 

{0, in the case (//„) is defined by (2), 

Wn,i, in the case (/i n ) is defined by (4), 
W n ,i, in the case (fi n ) is defined by (7). 

The assumptions to which we shall refer in the sequel are the following. 

(Al) lim^oo 6 n = 6 a.s. 

(A2) / is three-times continuously differentiable in a neighborhood of 6, 
where the Hessian D 2 f{9) of / at 6 is negative definite with maximal 
eigenvalue —L^ < 0. 

(A3) Let G n be the cr-field spanned by {W+ ti ,W~j,U q)k l<i,j<d, l<k< 
5, 1 < m,p, q < n — 1}. 

(i) W+ i} W~j and U n>k G {1, . . . , d}, k G {1, . . . , 5}) are inde- 
pendent conditionally on Q n . 

(ii) For some a > 0, Ya,i(Z\X = x) = a 2 for all x G M d , while, for 
some m> 2, sup^gjjd E(|Z| m |X = < oo. 

(A4) (i) There exists a £ ] max{l/2, 2/m}, 1] such that (a n ) G QS(-a). 

(ii) There exists r G]0, a/2[ such that (c n ) G QS{—t). 

(hi) lim n _K 30 no n e]max{|=^,-^j-},oo]. 

(iv) There exists a G ] max{l/2, 2/m}, 1] such that (a re ) G QS{— a). 

(v) • In the case limn^ooa" 1 ^ = 0, we have lim n _ >tX3 a n 1//2 a n x 

log(Efc=l «fc)/ c n = and limn^oo «n = 0. 

• In the case lim n ^ 00 a~ 1 6^ G]0,oo], we have J2®nbn < oo and 
lim^oo a n log(X)fc=l «fc)/ c n = °- 

(vi) lim n ^oo na re G]|,oo]. 

Comments on the assumptions. (1) Theorem 3 in [1] ensures that (Al) 
holds under (A2)-(A4) and the following additional conditions: (i) a + r > 
1 and 2(a — r) > 1; (ii) D 2 f is bounded; (iii) \/5 > 0, sup|| a ._e||> ( 5 f(x) < 
f(6); (iv) \fe > 0, 3p(e) > such that \\x -9\\>e=> ||V/(x)|| >~p(e). Let 
us underline that the conditions (i) on a and r are satisfied as soon as 
a G] 5/6,1] and r G [1/6,1/4], which include the most interesting choices 
of step sizes, as we shall see later on. Let us also mention that similar 
conditions, but which are less restrictive on a and r, can be found in [22] and 
[13]. Another kind of conditions with particular emphasis on control theory 
applications is given in [9, 17, 21]. The approach in these three references is 
to associate the approximation algorithm (1) with a deterministic differential 
equation in terms of which conditions are given to ensure (Al). 
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(2) Assumptions (A4)(i)-(iii) are the conditions on the step sizes required 
to establish the weak convergence rate of 6 n ; assumptions (A4)(iv)-(vi) are 
the additional ones needed for the consistency and for the weak convergence 
rate of [i n . 

(3) Condition (A4)(iii) [resp. (A4)(vi)] requires a n = 0(n~ 1 ) [resp. a n = 
C^n" 1 )] and, in the case (a n ) = (apn" 1 ) [resp. (a n ) 



(a n 



(9) 



ap > max 



1 - 2r 2r 



»)}' 



(resp. ao > 1/2). Set log 1 (n) = logn and, for j > 1, logj +1 (n) = logflog^n)]. 
Our conditions allow the use of the step size (a n ) = (ao[l°gp(n)] a n _1 ) intro- 
duced by Koval and Schwabe [16]; this step size has the advantage to lead to 
convergence rates very close to the ones obtained by using (ao^ _1 ), without 
requiring the tedious condition (9) on ao- 

(4) Assumption (A4)(v) is in particular satisfied as soon as the following 
conditions hold: 



-i h A 
-U4 



0, then 



If lim n ^oo a n 1 6„ G]0, oo ] , then 



< r < 

l-a 



2 4 ' 

< T < ' 



4 ^ ' ^ 4 • 

Our first result is the following proposition, which states the consistency 
of n n in the case [i n is defined either by (2) or by (4). 



Proposition 1. Let fi n be defined either by (2) or by (4), and assume 
(Al)-(A3) and (A4)(i)-(v) are satisfied. Then we have lim n ^ 00/ u n = ^ a.s. 

In order to state the weak convergence rate of (#^,/x n ) T , we set 



(10) 

(11) 
(12) 



(1 — 2t) lim (na n ) , 

n — >oo 

4r lim (non)^ 1 , 

n— >oo 

lim (n<v) _1 , 
n—>oo 

At lim (ndn)^ 1 , 



n— >oo 
2 



D 2 f(0) + 



D 2 m + ^ 



-1 



Ki<d 



a 



6(2 
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where a 2 is defined in (A3), and where Id denotes the d x d identity matrix. 
Let us underline that assumption (A4) implies that e [0, 2L^[ and 

G [0,2[; the parameters E^, A^, E^ and are thus well 

defined. 

We now state the joint weak convergence rate of 9 n and [i n in the case 
fi n where is defined by the algorithm (2). 

Theorem 1. Let (fi n ) be defined by (2), and assume that (A1)-(A4) 
hold. 

(1) // linin^oo a" 1 ^ = oo and if lirrin^oo a~ l c^ = oo, then 

c-\e n -e)\p(AW\ 



(2) If there exists 71 > such that ]im n -> 00 a n 1 cf l = 71 and i/lim^oo d n 1 cf L 
00, then 



where Z is Af(^y{A^ , E^)- distributed. 
(3) If lim n ^oo a" 1 ^ = 00 and if there exists 72 > such that linin^oo a~ c„ 
72, then 

Cn 2 _(0 n -6) \vfA^\ 
/^-(Hn - fi)J V Z' J ' 

where Z' is J\f(y/^A^ ,T,^) -distributed. 

+ 71 > and 72 > such that lin^^oo a" 1 ^ = 71 am 



limn^oo a n 1 cL = 72, then 



Comments on Theorem 1. 

(1) As is already well known, the optimal convergence rate of is 
obtained by choosing (a n ) = (aon -1 ), ao satisfying (9) and (c n ) = (n^ 1 / 6 ). 
In this framework, the best convergence rate of (/x n ) is n 1 / 3 ; it is obtained 
in the following ways: 

• either (a n ) is chosen such that lim n _ >00 a^n -2 / 3 = 00, the convergence 
rate of (/x n ) being then given by part (2) of Theorem 1, 
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• or (a n ) = (n~ 2 / 3 ), the convergence rate of (/i n ) being then given by part 
(4) of Theorem 1. 

(2) The optimal convergence rate of (fJ, n ) is close to (but less than) 
y/n/log logra. More precisely, let (v n ) G G<S(0) be such that limn-^ t> n = 
oo. For (fi n ) to converge with the rate \/n/(v n log logra), one must choose 
(a n ) = (ao ra_1 )) ao satisfying (9), and 

• either (a n ) = (n _1 ) and (c n ) = (v,^ 4 [log log n] 1 / 4 /? -1 / 4 ), the convergence 
rate of (/i n ) being then given by part (2) of Theorem 1, 

• or (a n ) = (n _1 u n log log n) and (c n ) = 0(ciy 4 ), the convergence rate of 
(/i n ) being then given by part (4) of Theorem 1. 

In this framework, the best convergence rate of (9 n ) is n 1 / 4 ?; 1 / 4 [log log n] 1 / 4 . 

(3) The tedious condition (9) on ao can be avoided by choosing (o n ) = 
(n^ 1 log„n). The convergence rate of (/%) is then close to (but less than) 

<Jnj (log p n log log n) . More precisely, let (v n ) G ^5(0) be such that limn^oo v n = 

oo. For (/_t n ) to converge with the rate ^Jn / (v n log p n log log n) , one can 
choose 

• either (a n ) = (n _1 ) and (c ra ) = (f.y 4 [log p n] 1//4 [loglog?i] 1 / 4 n -1 / 4 ), the con- 
vergence rate of (fJ, n ) being then given by part (2) of Theorem 1, 

• or (5 n ) = (n~ 1 v n log p n log log n) and (c n ) =0(al/ 4: ), the convergence rate 
of (fi n ) being then given by part (4) of Theorem 1. 

In this case, the best convergence rate of (9 n ) is n 1//4 t;y 4 [loglogn] 1//4 [log p n] _1 / 4 . 

The double algorithm (1) and (2) has thus two disadvantages: (i) it is not 
possible to choose a sequence (c n ) such that the convergence rates of ($■«) 
and (fi n ) are simultaneously optimal; (ii) the sequence (fj, n ) cannot converge 
at the parametric rate. 

We now state the joint weak convergence rate of 9 n and fi n in the case 
additional observations are made for the computation of /i n , that is, in the 
case (fj, n ) is defined by (4). 

Theorem 2. Let (/i n ) be defined by (4), and assume that (A1)-(A4) 
hold. 

(1) // linin^oo a~ l c n = oo, then 

( c-\9 n -9) \ WA(») 

where Z' is A/"(0,SW) -distributed. 
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(2) If there exists 71 > such that lim n ^oo a" 1 ^ = 71, then 



a n (fi n - f) 



J ' I e^> 



Comments on Theorem 2. Set (a n ) = (ao n_1 ) ? a o satisfying (9), (c n ) = 
(con -1 / 6 ), Co > 0, and (a n ) = (aon" 1 ), ao > 1/2. Part (2) of Theorem 2 en- 
sures that 

(n^(6 n -9)\ v Kr ((cl^)\ /a oCo - 2 SW 
l,V^(A*n-A*)/ VV )\ SoSW 

For this choice, 6 n converges with its optimal rate n 1//3 , and fi n converges 
with the parametric rate \fn. Moreover, let us note that the asymptotic 
variance SoE^ = ao[2ao — l]" 1 ^ 2 ^ 1 reaches its minimum a 2 /5 for ao = 1; 
the algorithm (4) is thus asymptotically efficient when (a n ) = (n^ 1 ). 

To state the joint asymptotic behavior of 6 n and fi n defined in (5) and 
(7), we need to introduce the notation 



P) \ 

l<i<d 

as well as the following additional assumption. 



(A5) (i) lim - — r = 00, 

v ; u ™iog(Efc=i«fc) 

r ELi«fclog(E, fc =ia,) n 
11) hm , = 0, 

(in) lim na?c~ 6 = 00. 

v ' n—KX> n n 



Theorem 3. Let (fi n ) be defined by (7), and assume that (A1)-(A5) 
hold with (a n ) = (n~ l ). 

(1) If linin^oo nc\ = 00, then 

( c-H0n ~0)\v(- [D*f(9)]-^<» ' 



V Vn(iin - fi) J ~" I vl 4r 2' 



where Z' is M(0, a 2 / 5) -distributed. 
(2) If lim n ^ oc nc\ = 0, then 



Vn(fin - fi) J \ ' 
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(3) // there exists 71 > such that lim n ^ 00 nc^ = 71 , then 

fn~cl{e n -ey 

-2 7l 1/3 [ J D 2 /(^)]- 1 J R (e) V ( ^^[^/W]- 2 

a 2 /S 








Part (3) of Theorem 3 corresponds to the case where both 9 n and \i n are 
asymptotically efficient: they converge with their respective optimal rates 
ra 1 / 3 and re 1 / 2 , and their asymptotic covariance matrix is optimal (see, e.g., 
[8] for the optimality of the asymptotic covariance matrix of 6 n ). To obtain 
the result of the third part of Theorem 3, one must choose (c n ) = (core -1 / 6 ), 
Co > 0, whereas different choices of the step size (a n ) are possible. For in- 
stance, one may choose: 

• either (a n ) = (aon~ a ), ao > 0, a E ]5/6, 1[, 

• or (o n ) = (aon _1 [logn]°), ao > 0, a > 0, 

• or (a n ) = (aon _1 [loglogn] Q ), ao > 0, a > 1. 

To conclude this section, let us mention that, in the case no additional 
observations are made to approximate fi, averaging the algorithm (1) reduces 
the optimal convergence rate of the sequence (/j, n ) then defined by (2). As a 
matter of fact, to average 6 n , the step size (a n ) in (1) must be chosen such 
that 

(15) lim na n l logo n = 00 

n— >oo 

[see assumption (A5)]. If the step size (a„) in (2) is set equal to (n _1 ), then 
the combination of (A4) and (15) induces the condition lim^^oo a~ l c^ = 
00, so that, in view of Theorem 1, c~ 2 (fj, n — fj,) converges to a degenerate 
distribution. Moreover, in this case the convergence rate (c~ 2 ) is necessarily 
less than ^/re/(log 2 ^) 2 . On the other hand, it is possible to choose (a n ) 

— 1/2 

such that a n (fj, n — fj,) converges to a Gaussian distribution. But, in this 
case also, because of the combination of (A4) and (15), the convergence 

rate (a n ) is necessarily less than ^/n/(log 2 n) 2 . So, if the only parameter 
of interest in the double algorithm (1) and (2) is fi, it is preferable not to 
average 9 n : choosing in (1) the step size (a n ) = (n _1 log p re) (with p > 2) 
introduced by Koval and Schwabe [16] allows one to get rid of the tedious 
condition (9) on ao and to obtain better convergence rates for (/x n ) than 
those which can be achieved by averaging 6 n . 



3. Proofs. Let us first state some elementary properties of the classes 
QS{a) of sequences that will be used throughout the proofs. 
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If (un) G GS{a) and (v n ) G GS{f3), then {u n v n ) G 0S(a + 
If («„) G 05(a), then for all c G M, «) G <?«S(ca). 

If (u n ) G 05(a), then for all e > and n large enough, n Q_<E < u n < n a+e . 
If (u n ) G 05(a) and £] n n = 00, then lim^oo mt n E£ =1 ""fc]" 1 = 1 + a. 



Now, set 
(16) 

(17) 

(18) 
and 



R ( nll = 2^{f^n + Wi) - f{9 n - C^)}^ - V/(0 n ), 



1 



i? 



fi+i 



if (fi n ) is defined by (2) 
f(0n) - A«, if (A*n) is defined by (4). 



(19) 



» 

-n+1 



jEKi + W n,ii if 0*») ^ defined by (2), 



i€<S 
5 



if (// n ) is defined by (4). 



i=l 



The recursive equation (1) can then be rewritten as 

(20) 9 n+1 = e n + a n [Vf(6 n ) + R%] + ^e^, 

Cn 

and the algorithms (2) and (4) as 

(21) AWi = A*n + a n [{n - Un) + R^+l] + ^n+V 

These equations (20) and (21) can be viewed as particular stochastic approx- 
imation algorithms used for the search of a zero of a given function [of the 
function V/ for (20) and of the function x 1— > [i — x for (21)]. In Section 3.1, 
we state some preliminary results on stochastic approximation algorithms 
used for the search of zeros of a function h that will be applied several times 
in the sequel; the proof of these preliminary results can be found in the tech- 
nical report arxiv:math.ST/0610487vl. In Section 3.2 we establish an upper 
bound on the almost sure convergence rate of 6 n , which will in particular 
be used to prove the strong consistency of fj, n . In Section 3.3 we first prove 
Proposition 1, and then give an upper bound on the almost sure convergence 
rate of \x n defined either by (2) or by (4). Section 3.4 is devoted to the proof 
of Theorems 1 and 2, and Section 3.5 to the proof of Theorem 3. 
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3.1. Some preliminary results on stochastic approximation algorithms. 
We consider the stochastic approximation algorithm 

(22) Z n+ i = Z n + j n [h(Z n ) + r n+ i] + er n e n+1 , 

where the random variables Zq, (r n ) n >i and (e n ) n >i are defined on a proba- 
bility space (£l,A,F) equipped with a filtration T = (.F n ), an d the step sizes 
(j n ) and (cr n ) are two positive and nonrandom sequences that go to zero. 

Stochastic approximation algorithms [such as (22)] used for the search 
of zeros of a function h : M d — > M. d have been widely studied under various 
assumptions; see [9, 23, 25] and the references therein. The object of this 
section is not to give the most general existing result on (22), but only to 
precisely state the results we shall use in the sequel for the study of (20) 
and (21); in particular, the hypotheses below are not the most general ones, 
but are appropriate in our framework. 

(HI) There exists z* G M. d such that lim^^oo Z n = z* a.s. 

(H2) h is differentiable at z* , its Jacobian matrix H at z* is symmetric, 

negative definite with maximal eigenvalue — L < 0, and there exists a 

neighborhood of z* in which h(z) = H(z — z*) + 0(11-2 — £*|| 2 ). 
(H3) (i) E(e n+ i|jF n ) = and there exists m > 2 such that 

sup„> E(||e n+ i|H.F n ) < oo. 

(ii) There exists a nonrandom, positive definite matrix T such that 

lim n _ >00 E(e n+ ie^ +1 | T n ) = T a.s. 

(H4) r n+ i = Rn+i + 0(\\Z n — z* || 2 ) a.s., and there exist p G M. d and a non- 
random sequence (u n ) such that: 

(i) limbec yJu^R^Xi = p a.s. 

(ii) There exists u* > such that (u n ) G QS(u*). 

(H5) (i) There exist a G ] max{l/2, 2/m}, 1] and f3 > a/2 such that (7^) G 
GS(-a) and (o~ n ) G QS{—[3). 

(ii) limfj^oo wjn G ] max{ 2 , 00] , where L and u* are defined 
in (H2) and (H4)(ii), respectively. 

The asymptotic behavior of the algorithm (22) is given by the behavior of 
the sequences (L n ) and (A n ) defined by 

k=l 

A n +i = {Z n +\ — z ) — L n+ i. 

In order to prove Proposition 1 and Theorems 1 and 2, we shall apply several 
times the following two lemmas. 
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Lemma 1 [A.s. upper bound of (L n )]. Under hypotheses (H2), (H3) and 
(H5), we have \\L n \\ = 0(\J^n log(2fc = i Ik)) a.s. 

Lemma 2 [A.s. convergence rate of (A n )]. Under hypotheses (H1)-(H5), 

r -l 

we have lim^oo ^/u^A n = — [H + p a.s. 

Let us mention that, in particular, the combination of Lemmas 1 and 2 
gives straightforwardly the following upper bound of the a.s. convergence 
rate of Z n toward z*: 



(23) ||Z„- 2*|| = 



A 



TnV^log [J^Tfc )+u n 1/2 ) a.s. 



\A; = 1 



To end this section, we now state a result concerning the averaged stochas- 
tic approximation algorithm derived from (22); we set 

— 1 n 

Ek=ii k a k k=i 

and assume the following additional condition holds: 
(H6) (i) lim ™ log( 2 i7fc) =oo. 

\j2Zk=ilk a k 

(hi) The sequence (u n ) defined in assumption (H4) satisfies 
lim nu n a 2 n = oc, lim ^=ilk^kV = Q 



n— too n— »oo 



Turn 



The asymptotic behavior of (Z n ) is given by the behavior of the sequences 
(A n ) and (H n ) defined by 

1 n 

2^fc=i lk a k k=i 

—n+l = (Zn — z*) — A n+ i. 

In Section 3.5, we shall apply several times the following lemma, which gives 
the asymptotic almost sure behavior of (E n ). 

Lemma 3 [A.s. convergence rate of (S n )]. Assume that (H1)-(H6) hold. 
(1) //lim n ^ oo [n 7 >- 2 ]- 1 /2[ E n =l7 2 ff -2 u -i/2 H0) then 



lim J nj%o- n 2 E n = a.s. 
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(2) If the sequence ([n^fn a n 2 ] 1 ^ 2 [J2k=i 7k a k 2 % ^l" 1 ) ^ s bounded, then 



lim - 

n->oo v^n 2^.-2 -V 2 
l^k=\lk a k U k 



-{l-2a + 2/3)H~ 1 p a.s. 



3.2. Upper bound of the a.s. convergence rate of 6 n . Set 
(24) 



fc=l 

(25) G = Z) 2 /W 



n 



(26) 



(27) 



fc=i 



fc+i' 



A 



L 



iff) 
n+V 



The application of Lemma 1 to the recursive equation (20) [with h = V/, 
(7„) = (a n ) and (cr n ) = {a n c~ l )\ gives straightforwardly the following lemma. 

Lemma 4 [A.s. upper bound of (Lh )]. Under assumptions (A2)(ii), 
(A3) and (A4)(i)-(iii), we /iave || = 0(\J a n Cn 2 log s n ) a.s. 



Now, let -R^+x j denote the ith coordinate of R^li [defined in (16)]; we 
have 



>(0) 



- {[f(o n + c n ei) - f{e n )} - [f(9 n - Cn ei) - f(e n ))} - ^-{e ri 



1 

2c~ n 



df_ 

dxi 



+ 



cld'f 



(0 n ) + 



z 3 n d 3 f 



{On) + O{cl) 



6 dx 3 



dxi 

?n) + o(4)» 



2 dxf n 6 dx 3 

+ ^S(^)-|0(^) + o(4) 



2 &tf 



^-1 

9xj 



and thus, in view of assumptions (Al) and (A2)(i), linin^oo c~ 2 i?^ x = 
a.s., where R^ is defined in (14). The application of Lemma 2 [with (y/u^) = 
(c~ 2 ) and p = R^'] then gives the following lemma. 

Lemma 5 [A.s. convergence rate of (A„ )]. Under assumptions (Al)- 
(A3) and (A4)(i)-(iii), we have lim^oo c" 2 A^ } = a.s. where A^ is 
defined in (11). 
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Let us note that the combination of Lemmas 4 and 5 ensures that, under 
assumptions (A1)-(A3) and (A4)(i)-(iii), 



(28) 



8\\ = 0(\Ja n Cn logs n + c n ) a.s. 



3.3. On the a.s. asymptotic behavior of fi n defined by (2) or (4). In 
the case ^ n is defined either by (2) or by (4), the a.s. convergence of \i n 
(resp. the a.s. convergence rate of /i n ) is obtained by applying the Robbins- 
Monro theorem (resp. Lemmas 1 and 2) to the recursive equation (21). Since 

the R^li term in (21) depends on 6 n [see (17)], we first upper bound this 
perturbation term by using the results of the previous section. To this end, 
we first note that in the case (/j, n ) is defined by (2), we have 



R 



0*) 

n+l 



7E 



/(*») + cn^(e n ) + fg^(e n ) + o{cl) 



+ 



f(0n) 



Of 



d 2 f 



Cn dxi {dn)+ 2 dxl 



(On)+O(c 2 n ] 



fl 



(29) 



d 2 f, 



/(*)] 



d 2 f, 



fJ2^2( d n) + o(c 2 n ) + O(\\0 n -e\ 



dxj 



a n log s n 



a.s. 



[where the last equality follows from the application of (28)]; in the case 
(fj, n ) is defined by (4), similar computations give 

'a n logs r 



(30) 



R 



0*) 



n+l 



o 



+ c 



a.s. 



In view of assumption (A4)(v), we deduce that: 
" U4 - 0, then 



If lim rwoo a n i &,; 



(31) 



lim Ja^R^ 



n+l 







a.s. 



If lirrin^oo a n b^ G ]0, 00], then 



(32) 



a.s. 



We can now prove Proposition 1 and give an upper bound on the a.s. 
convergence rate of \x n . 
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3.3.1. Proof of Proposition 1. 

• In the case lim n ^ 00 a~ l b^ = 0, we have, in view of (31), a n |i?^+il 2 = 0{a^) 
a.s., and thus, in view of (A4)(iv), X^nl^+il 2 < 00 a - s - 

• In the case linin^ooa" 1 ^ g]0,oo], we have, in view of (32), a^R^^ 2 = 
0{a n bn) a.s., and thus, in view of (A4)(v), X^nl-^i'+il 2 < °o a - s - 

In both cases, the application of the Robbins-Monro theorem (see, e.g., [9], 
page 61) ensures that X^n^n — p) 2 < oo a.s. Since J2®n = oo [see (A4)(vi)], 
it follows that linin^oo p n = p a.s. 

3.3.2. Upper bound on the a.s. convergence rate of p n defined by (2) or 
(4). Set 



(33) ~s n = £ 



CLk, 



k=l 



n 



(34) L^^e^^^A 

k=l 

(35) A^^-^-L^ 

[where En is defined in (19)]. The application of Lemma 1 to the recur- 
sive equation (21) [with h:x^p — x, (-y n ) = (a„) and (cr n ) = (a n )] gives 
straightforwardly the following lemma. 

Lemma 6 [A.s. upper bound of (L^)]. Under assumptions (A3), (A4)(iv) 
and (A4)(vi), we have \L^\ = 0(\/a n log s n ) a.s. 

Moreover: 

• if lirrin^ooa^ 1 ^ = 0, then, in view of (31), the application of Lemma 2 
[with (^fu~^) = (Vdn 1 ) and p = 0] gives the first part of Lemma 7 below; 

• if lmin^oo a~ b^ £ ]0, oo], then, in view of (32), the application of Lemma 2 
[with (y/u^) = (b~ 2 ) and p = | EieS f^Wl § ives the second part of 
Lemma 7 below. 

Lemma 7 [A.s. convergence rate of (A„ )]. Let (Al)-(A4) hold. 

(1) // lim n _^oo a~ x bn = 0, then lim n ^oo \J a^ 1 A^ = a.s. 

(2) //lim^ooa- 1 ^ 4 , e]0,oo], then lim^oo 4 A^ } = a.s., where AW 
defined in (13). 
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Although only Lemmas 6 and 7 will be used in the sequel, we state here the 
following proposition, which is obtained as a straightforward combination of 
these two lemmas, and which is of independent interest. 

Proposition 2 [A.s. upper bound of (/x„ - fi)]. Under (Al)-(A4), we 
have: 

(1) If lim n ^oo a' 1 bn = 0, then \fi n - fi\ = 0(sja n log s n ) a.s. 

(2) //lim^ooa" 1 ^ e]0,oo], then \fx n - fx\ = 0(^a n \ogs n + b 2 n ) a.s. 

3.4. Proof of Theorems 1 and 2. In view of the definition of Ln \ A^, 
LP and A<?> [see (26), (27), (34) and (35), resp.], Theorems 1 and 2 are 
straightforward consequences of the combination of Lemmas 5 and 7 together 
with the following lemma. 

Lemma 8 [Weak convergence rate of (Ln ,Ln )]■ Under (A2)-(A4), 



where and are defined in (10) and (12), respectively. 



Proof. Set 



(n) _ ( J^ n e SnG \ ' / e-^ akC -l £ f) 



» 



^a n e~ Sn J fc=i V e Afe a fc e^ 

For each n, iU» = (M"j n) ) j>i is a martingale whose predictable quadratic 
variation satisfies 

with 

A llB = a-M^E feV^E[ef [ e f ] T |^ 1 ]e^ GT }e- GT , 



>fc=l 



^4,n = a- 1 e- 2 - E«^ E [[4 M fl^i] 

U=i 
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Now, under assumption (A3), we have, in view of (18) and (19), 

e 2 h 



e[4 s) [4Ti&-i] 



E[[eiY4% fe _i]=0 a.s. 



It follows that ^2,n = and, by application of Lemma 4 in [24], linin^oo Ai^ n 
and lim^oo A^ n = E^. We thus obtain 

n lim(M)W = (^ ) E ° M 

Moreover, in view of assumption (A3), we have 

n 

E E [ii^ n) - M iiiri^-i] 
fc=i 



a.s. 



\fe=i fe=i 

0(«#> +«;(?)) a.s. 



with 



7t 



fc=i 



fc=i 



Now, since (a„ c„) E QS{a — 2r), we note that 



w 



(6) 
n+l 



1 „2 n m / 2 



°n+l C n+l 



On cS 



-mLWan+t JO) , m/2 

e u> n -t- a n+1 



a - 2r 
1 + — + o 



n + l + 1 



1 



m/2 



[1 - mL^a n+ i + o(a n+ i)]u4 0) + a™_j^ 
[1 + + o(a n+1 )] m/2 [l - mLWa n+1 + o(a n+1 )]w^ + a^{\ 

[1 - mL (9) a„ + i + o(a n+ i)]u; n 9) + a™H 



1 H 5 — fln+1 + °\ a n+i) 



(fl) m/2 
S T a n+l ■ 



kn+i + o(a n+ i) 

Set g]0,LW - £ (e) / 2 [; for n lar S e enough, we get 

<(l-A(% n+1 )\wW\+a™H 



w 



(0) 

n+l 
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(0) 

and the application of Lemma 4.1.1 in [9] ensures that liuin^^Wn = 0. In 
the same way, since (a" 1 ) G QS{— fi), we have 
-m/2 



n+l 



-ma n+1 f M ) + ~m/2 



n+l 



1 



a 



+ o 



1 



-m/2 



[1 - ma n+1 + o(5 n+ i)]u;^ ) + 5™+i 

/ v | -L \ 1 1 | J. / 

[1 - ^a n+ i + o(a n+1 )]~ m/2 [l - ma n+ i + o(a n+ x)\w$ + a™H 

[1 - ma n+1 + o(a n+1 )]w^ + a^+l 



1 H o — + °( a n+l; 



/ £(*0 \ 
1 - ml 1 — Ja n+ i + o(a n+lj 



M + „ m /2 



«A W + a 



n+l' 



from which we deduce that lhm^oo Wn = 0. It thus follows that 

n 

J2n\\Mi n) -Mi%f\g k ^} = o(l) a.s., 

k=l 

and the application of Lyapounov's theorem gives 

which concludes the proof of Lemma 8. □ 
3.5. Proof of Theorem 3. Set 



A 



n+l 



1 r~ l ST r <W 



2^k=i c k 



A. 



fc=i 
(«) 

n+l' 



E (M) 
-fc+1 



A 



n+l 



J n+1 



<5^ 



i=l 



fe=l 



(jin+l -fi)- AjJ ls 



where ejj. and G are defined in (18) and (25), respectively. Theorem 3 follows 
straightforwardly from the combination of the three following lemmas, which 
give the a.s. convergence rate of (S^ ), of (En ) and the weak convergence 
rate of ( A^ , A^ ) , respectively. 
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(o) 

Lemma 9 [A.s. convergence rate of (Hn )]. Let the assumptions of The- 
orem 3 hold, and recall that RS e ' is defined in (14). 



i n ^oo nc\ = 0, then lim 



oo, then lim^ooc" 2 = -( ^)G~ 1 R^ 
~i? ] = a.s. 
— 7, then lim n 



(1) // lim^oo nc; 

(2) //lim 

(3) If there exists 7 > stzc/i i/iai linin^oo nc n 
-2 7 1 /3G- 1 i?W a.s. 



a.s. 



00 V " ,t 'n '— 'i 



Lemma 10 [A.s. convergence rate of (E^)]- Under the assumptions of 
Theorem 3 we have linin^oo 



^l+l = a.s. 



Lemma 11 [Weak convergence rate of (A^A^; 
Hons of Theorem 3, we have 

( (a 2 (l-2r) 



Under the assump- 



(/■<) 



■ AA 



0. 



V 



-G- 



0^ 







V 



y/ 



Proof of Lemma 9. Set (<y n ) = (a n ), (a n ) = (a n c n l ), (u n ) = (c n 4 ) and 
eG]0,(l-2r)/2[. Since 



En _ 2_— 2„ —1 6 



n7^ CTn 2 



o 



'nc~ 



n + nc, 



'net 



ol) 



we can apply Lemma 3 to the recursive equation (20). Assumption (A4)(v) 

00, and thus J2 c t = 00. Since (c 4 ) € QS(-At), we 



implies lirm^oo nc„ 
have 

(36) 



nc„ 



lim 



l -At. 



Consider the case lim^^oo nc^ g]0,oo]. We then have r < 1/6 and it follows 
from (36) that 



net, 



O(l). 



The application of the second part of Lemma 3 then ensures that 



lim 

n— >oo 



2^k=i c k 



-(1-2t)G~ 1 R^ a.s., 



and, applying (36) again, we obtain 



(37) 



lim c" 



1 -2t 
1 -4t 



G-^W a.s., 
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which gives the first part of Lemma 9. Note that if lim n >OQ e]0, oof, then 

r = 1/6; the third part of Lemma 9 follows straightforwardly from (37). 

Now, consider the case lim n ^ 00 nc^ = 0. Set e € ]0, (1 — 2r)/2[; using the 
fact that (c4) £ GS{— 4r) with r < 1/4 and applying (36) in the case r ^ 1/4, 
we obtain 



0(1). 



The application of the first part of Lemma 3 then ensures that lim r 
a.s., which concludes the proof of Lemma 9. □ 



*oo V '"-n 1 — w 



Proof of Lemma 10. We have 

8 



inM I 



1 1 

-E s^2 z ' 
n k=ii° i=i 

i n 

-£[/(**)-/(*)] 



fc=i 



fe=i 



of^EoiA^ir+n^in 



(A) 



k=l 



By applying for instance Corollary 6.4.25 of [10], we get 



n 




E °k e k+l 


< 


k=l 





Kk=l 



and thus 

HASill^OCM^Ioglogn) 
The application of Lemma 9 then ensures that 



a.s. 



IsKil = ( - X>£ + (^ir 1 log log fc]) 



E c ii°gi°g E c i a - s - 



a.s. 



= 0(c n + (nc^) 1 log log n) a.s. 
In view of (A4)(v) (with b n = and a n = n _1 ), Lemma 10 follows. □ 
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Proof of Lemma 11. Set 



M 



(n) 



/ 



V 



E 

\k=l 



-1/2 



4 



GT 





„-l/2 



E 



^=1 



Cfce^ 



In view of (A3), for each n, Ai^ = (Aij )j>i is a martingale whose pre- 
dictable quadratic variation satisfies 

\ 



and we have 



(n) 



2 







V 



a.s. 



EE[|i^r-M- j ill \o k -i) 

k=l 



o 



J24 

k=l 



-i -m/2 n 



E c r + 



n 



l-m/2 



a.s. 



fc=i 



= o(l) a.s. 

The application of Lyapounov's theorem then ensures that 
/ 



Mi n) 



\ 



\ fe=i 



\ 


f / 




o, 


/ 





-G~ 











T/ 



and Lemma 11 follows from the fact that, since (c 2 n )egS(-2T) with r> 1/2, 



we have lim^oo nc 2 [J2l =1 1 = 1 ~ 2r - a 
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